The Current Status of Projective Tests
McGrath, R. E., & Carroll, E. J. (2012). The current status of “projective” “tests” In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology, Vol. 1. Foundations, planning, measures, and psychometrics (pp. 329–348). American Psychological Association. https://doi.org/10.1037/13619-018
CHAPTER 17
THE CURRENT STATUS OF “PROJECTIVE” “TESTS”
Robert E. McGrath and Elizabeth J. Carroll
The term projective tests is often used to encompass a variety of procedures that allow the target individual to provide free-form responses to ambiguous stimuli. The participant’s responses are thought to be sensitive to implicit processes, and consequently they may be somewhat resistant to efforts at misrepresentation.
This class of instruments has had a particularly checkered past. Because of concerns about honesty in responding to self-report measures, and the psychoanalytic belief that much of mental activity is resistant to self-observation, psychologists became enamored with the potential of projective instruments. The development of the Rorschach Inkblot Method (Rorschach, 1921/1942) preceded formal discussions of projective psychological tests, but its popularity in the United States is largely attributable to its presumed projective qualities. The Rorschach was soon joined by other instruments, including the Thematic Apperception Test (TAT; Morgan & Murray, 1935; Murray, 1943), the Rosenzweig (1978) Picture–Frustration Study, and the Szondi Test (Deri, 1949). Even tests developed for other purposes came to be used as projectives, particularly the Bender Visual Motor Gestalt Test (Hutt, 1985). A 1959 survey found the three most commonly used psychological tests in clinical practice were projective tests (Sundberg, 1961).
By the 1960s, though, the allure was fading for two reasons. One was the general critique of traditional personality assessment that emerged out of behaviorism. Mischel (1968) questioned whether
the criterion-related validity of personality measures was sufficient to justify their use, whereas Goldfried and Kent (1972) criticized the practice of using latent constructs to account for associations between test behavior and behavioral outcomes. The second factor was a psychometric critique of projective methods (e.g., Cronbach, 1949; Entwisle, 1972; Swensen, 1957, 1968).
This second literature has engendered an enduring negative perception of projective instruments in the scientific community. Although surveys in the 1980s and 1990s found more than 75% of clinical doctoral programs required training in projective testing (Piotrowski & Keller, 1984; Piotrowski & Zalewski, 1993), a more recent update saw that rate drop to 59%, with more than half of program directors reporting reduced training in projectives (Belter & Piotrowski, 2001). A recent attempt to generate a list of discredited psychological tests was largely dominated by projective instruments (Norcross, Koocher, & Garofalo, 2006). Only a few instruments, primarily the TAT and Rorschach, continue to appear with regularity in the assessment research literature. The continuing popularity of the former can be traced at least in part to its successful use in motivational research (e.g., McAdams, 1982); that of the latter is directly attributable to the success of Exner’s (2003) Comprehensive System, which brought uniformity in administration and scoring, normative data, and interpretation to the Rorschach.
In contrast, although clinicians are administering fewer tests, largely because of managed care
We are grateful to Luke Mason for our list of commonly used anachronistic labels.
DOI: 10.1037/13619-018
(Piotrowski, 1999), the popularity of the Rorschach, the TAT, and figure drawings relative to other instruments has remained consistent over a period of decades (Archer & Newsom, 2000; Lubin, Larsen, & Matarazzo, 1984; Musewicz, Marczyk, Knauss, & York, 2009). The disparity between clinical and academic attitudes underlies the discussion of broadband measurement in the section Implications.
The remainder of this chapter summarizes the current status of projective instruments as scientific instruments. The first section offers a conceptual analysis of the nature of projective assessment. Drawing on recent discussions of projective assessment and comparisons with other psychological measurement methods, it is suggested that applying both the word projective and the word test to these instruments is problematic and probably should be discontinued.
Current evidence on each of three projective instruments—the Rorschach, TAT, and figure drawings—is reviewed. Although other projective instruments are used in clinical assessment, particularly various forms of Incomplete Sentences Blank (Rotter, Lah, & Rafferty, 1992), these three techniques are the most extensively researched and effectively reflect the current status of projective instruments.
WHAT WE TALK ABOUT WHEN WE TALK ABOUT “PROJECTIVE” “TESTS”
Murray (1938) and L. K. Frank (1939) provided the seminal works on what is called the projective hypothesis. They hypothesized that free-format responding to ambiguous or “culture-free” (L. K. Frank, 1939, p. 389) stimuli would encourage the emergence of personal meanings and feelings. The labeling of certain instruments as projective also provided a clever phonetic contrast to “objective” measures, such as rating scales that restrict the set of acceptable response alternatives. The prototypical projective instrument demonstrates the following features:
- Test stimuli are ambiguous in some important way. For example, the Rorschach Inkblot Method presents the respondent with a fixed series of inkblots and the question, “What might this be?” The TAT requires the respondent to create a story on the basis of a picture in which people are engaged in uncertain behavior.
- Although some responses are incompatible with the instructions, for example, refusing to respond to a Rorschach card (in some instructional sets) or saying it is an inkblot, the number of acceptable responses to the stimuli is infinite. Traditional Rorschach practice even allows the individual to decide how many responses to make to each inkblot.
- The use of ambiguous stimuli is intended to elicit idiosyncratic patterns of responding, such as unusual percepts or justification for those percepts on the Rorschach, or unusual story content or story structure on the TAT.
- Because of their free-response format, projective instruments often require individual administration and specialized training in administration and scoring.
The Problem With Projection
The use of the term projective carries with it certain implications about the cognitive process that determines important test behavior, implications that have been questioned in recent years by individuals closely associated with the study of the Rorschach (e.g., Exner, 1989; Meyer & Kurtz, 2006). To understand why this shift has occurred, it is important to recognize at least three problems associated with calling these instruments “projective.”
The ambiguity of the word projective. A substantial literature now exists demonstrating that nonconscious activity molds our conscious thoughts, feelings, and intentions. As a result, advocates of psychoanalysis have asserted that modern cognitive science has corroborated the Freudian model of the unconscious (e.g., Westen, 1998). Kihlstrom (2008) has argued persuasively against this conclusion. Freud may have popularized the importance of unconscious activity, but it was already a widely respected hypothesis. What Freud added was the suggestion that adult mental activity is largely determined by primitive, selfish, and repressed wishes and feelings. Cognitive research has little to say about this more specific hypothesis.
Similar issues arise surrounding the concept of projection in connection with psychological instruments. Freud (1896/1962) used the term first to refer to a specific defense mechanism characterized by the unconscious attribution of one’s unacceptable feelings and wishes to some external object or individual. Later he used the term in a more general sense to encompass any idiosyncratic construction of environmental stimuli (Freud, 1913/1990). It was this latter use of the term Murray (1938) referenced when he drew the connection between responding to psychological tests and psychoanalytic theory. Meehl’s (1945) classic proposal that even objective tests have a dynamic aspect reflects the same understanding (see also L. K. Frank, 1948; Rapaport, 1946), although he was wise enough to put the word projection in quotes to reflect its ambiguity.
As in the case of the unconscious, there is nothing uniquely Freudian about the general proposition that different people construe stimuli differently, that ambiguity in the stimulus field can contribute to individual differences in stimulus responding, and that those differences can reveal something important about the individual. Once that proposition is couched in terms of projection, however, it takes on an ambiguous Freudian connotation.
The characterization of respondent behavior. If the concept of projection is unnecessary for understanding instruments such as the Rorschach, it is also clearly not sufficient. On the basis of prior attempts to define the scope of potentially interesting respondent behaviors to projective instruments (e.g., Bellak, 1944; Exner, 1989; McGrath, 2008; Weiner, 1977) as well as personal experience with these instruments, we would suggest that at least six sources of information can be observed using a projective instrument (see Table 17.1), although these tend to be of varying importance across projective instruments and respondents.
Thematic material refers to the degree to which responses contain language or phrasing that reflects certain attitudes or emotional states. This is the information source that comes closest to the concept of projection, in that respondents may use words reflecting issues of particular concern for them, but these concerns need not be unavailable to consciousness.
Exner (1989, 2003) has argued that Rorschach (1921/1942) was particularly interested in his instrument as a method for detecting perceptual idiosyncrasies. This bias is evident in Rorschach’s original instructional set, “What might this be?” It is also evident in his creation of the inquiry phase, an important and distinctive element of Rorschach administration in which the respondent is asked to explain how each response was formulated. Although perceptual idiosyncrasies play a particularly central in the Rorschach, they can be important for any projective instrument in which the respondent is expected to respond to stimulus materials, for example, when the respondent clearly ignores or distorts a central element of a TAT picture. Although such distortions can reflect a disordered perceptual style, perhaps suggestive of thought disorder, it is hypothesized they can also suggest issues discomforting the respondent.
TABLE 17.1
Information Sources Available Through Projective Tests
| Source | Examples |
|---|---|
| Thematic material | Morbid themes (R and T); stories that focus on achievement (T ) |
| Perceptual idiosyncrasies | Poor form quality (T ); preoccupation with small details (R and T); omission of critical stimulus elements (R and T) |
| Extratest behavior | Card rotation (R); attempts to reject stimuli (R and T ) |
| Self-descriptive statements | Indications of task-related discomfort or enjoyment (R and T) |
| Quality of thought | Illogical justification of percepts (R); tangentiality (R and T) |
| Quality of speech | Vocabulary, rhyming, or use of clang associations (R and T ) |
Note. R = Rorschach Inkblot Method variable; T = Thematic Apperception Test variable.
Extratest behavior encompasses anything distinct from responses on the basis of the instructional set, including the manner in which the person handles the physical stimuli or behaviors, such as odd mannerisms and expressions of resistance. Of these, expressly self-descriptive statements are significant enough to mention as a distinct source of information. Because of its free-response format, the Rorschach or TAT gives the respondent license to make statements providing clues about cardinal traits or distinctive ways of understanding themselves. This provides a complementary perspective to the standardized approach to trait description offered by objective instruments. In practice, however, selfdescriptive statements during administration of projective instruments are often restricted to the respondent’s reactions to the instrument.
Quality of thought refers to the logic or reasonableness of the thought processes evidenced during the administration. Finally, quality of speech encompasses various factors associated with effectiveness of communication, including the length of responses, complexity and precision of the language used, and so forth.
The role of ambiguity. It is worth speculating whether the emphasis on projection has led to misleading conclusions about the relationship between ambiguity and clinical usefulness, although the empirical basis for this point is thin. One of the corollaries of L. K. Frank’s (1939) projective hypothesis was that greater ambiguity is associated with greater potential for projection. Perhaps the most extreme product of this proposition is Card 16 of the TAT, which is a white card for which the respondent is instructed to both imagine a picture and tell a story about that picture.
In fact, although some evidence indicates that responding to Card 16 is related to creativity (Wakefield, 1986), clinicians find the stories are often less interesting than those provided to other cards (Groth-Marnat, 2009). Similarly, recent evidence indicates Rorschach (1921/1942), who was something of an artist, touched up his original inkblots in a manner that made them more evocative of certain percepts (Exner, 2003)—that is, he made them less ambiguous and more culture-bound than they were
originally. It may well be the case that the evocative elements introduced by Rorschach’s modifications of the blots engages the respondent more than would a completely amorphous blot and that this feature has contributed to the continuing preference for the Rorschach inkblots over more psychometrically defensible alternatives, such as the Holtzman Inkblot Test (Holtzman, Thorpe, Swartz, & Herron, 1961).
Implications
Taking these three arguments together, the conclusion has started to emerge within the community of clinical researchers interested in projective instruments that projection is not a necessary or even a particularly important contributor to the clinical value of such instruments. To state the overall conclusion simply, these measures seem to be interesting not because they are projective but because they are provocative.
McGrath (2008) discussed alternative conceptualizations for instruments traditionally considered projective. They can be distinguished from selfreport measures in that they are performance-based or behavioral measures (McDowell & Acklin, 1996) in which the style of responding to the stimuli represents the behavior of interest.
They can be distinguished from performancebased measures of ability, such as intelligence tests, in that they are relevant to a person’s characteristic manner of interpreting environmental stimuli, a concept thought to have relevance to understanding personality or interpersonal style. Although some variables may suggest the possible presence of a specific mental disorder—for example, certain perceptual or thinking irregularities can reflect a psychotic disorder—projective instruments are generally intended to reveal something about the respondent’s characteristic style of interacting in the world.
Finally, they can generally be distinguished from other performance-based measures of characteristic style in that they are broadband in emphasis. A broadband measure is one that is sensitive to multiple latent constructs (Cronbach & Gleser, 1957). For example, one Rorschach variable described by Exner (2003) is Personalized Answers, which is the justification or clarification of a percept by reference to personal knowledge. Personalization can occur
for various reasons, including discomfort with the Rorschach or with the testing in general, insecurity about one’s abilities, an attempt to assert superiority over the tester through demonstration of personal knowledge, or narcissistic tendencies (Weiner, 2003).1
A number of more narrowband behavioral measures of characteristic style have become popular in recent years. These include behavioral measures such as the Strange Situation (Ainsworth, Blehar, Waters, & Wall, 1978) and the implicit measures that have emerged out of cognitive and social psychology, such as the Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998) or the Stroop color word test (Phaf & Kan, 2007). The IAT is a more narrowband instrument because any one administration is intended to investigate speed of association between only two constructs. In contrast, as Table 17.1 attests, the less structured format of the Rorschach permits a much broader array of issues to emerge.
The broadband nature of projective instruments has several important implications. First, Weiner (1994) criticized the common practice of referring to the Rorschach as a test, which implies a fixed scoring protocol for purposes of detecting a specific latent variable. Instead, results of a Rorschach administration are more akin to the transcript of a therapy session. That is, the Rorschach is a technique or method of data collection that can be explored from multiple perspectives. For this reason, Weiner recommended referring to the Rorschach Inkblot Method. The dominance of the Comprehensive System has muddied this distinction between the Rorschach as a test and technique, but keep in mind that the Comprehensive System does not represent the only approach possible to deriving quantitative data from the Rorschach. This analysis suggests the TAT is not a test despite its name.
Second, when Cronbach and Gleser (1957) applied the concept of bandwidth to psychological testing, they introduced a second concept intended to raise cautions about broadband instruments. A broadband instrument is efficient as a screening device for a variety of personal attributes, and for
this reason can be a potentially valuable tool when the goal of testing is to generate a description of the respondent. However, they also noted that broadband measures tend to be low in fidelity, the degree to which a firm inference can be generated about a respondent on the basis of an outcome. Where relatively narrow bandwidth measures such as selfreport tests with a high level of internal consistency tend to be higher in fidelity, and are therefore more easily interpreted, a score on a broadband variable can be ambiguous. It is the popularity of such relatively broadband, relatively low-fidelity instruments in clinical practice that account for references to clinical assessment as an art.
So what are we to call this set of sorely misunderstood instruments? A generally accurate term would be broadband performance-based measures of characteristic style, but this is quite a mouthful and unlikely to catch on. A second option would be to retain the simple term projective, because traditions often die hard, but with the recognition that—as in the case of starfish and the Pennsylvania Dutch—the term’s use reflects precedent rather than precision. This approach still implies a necessary connection with psychodynamic theory, however.
We recommend discontinuing references to “projective” “tests” and using instead the terminology broadband implicit techniques (BITs; see also Fowler & Groat, 2008). This terminology is compact yet captures several key features of BITs: They are intended to access multiple information channels, they potentially access automatic or poorly self-observed mental activities that contribute to social identity, and they are primarily data-gathering techniques rather than standardized tests.
If it is true that broad bandwidth implies low fidelity, it has important implications for the empirical status of BITs. To the extent that BITs are useful in clinical screening, there is a strong rationale for research evaluating their reliability and validity. To the extent that BITs are ambiguous in meaning, however, the rationale is poor for using them as primary indicators of respondent status on a specific latent variable. One challenge facing proponents of BITs is to identify which latent variables these
1Broadband measures, where a single variable can reflect multiple latent constructs, should not be confused with multidimensional measures such as inventories that consist of multiple scales, each of which taps a single latent construct. The Rorschach is both multidimensional and broadband.
instruments reflect with sufficient fidelity to justify their use as operationalizations in research contexts, for example. This challenge will be addressed in the context of a general psychometric evaluation of the three most popular BITs.
THE PSYCHOMETRIC STATUS OF THE RORSCHACH
Reliability
Different authors have reached different conclusions about the reliability of the Rorschach, primarily because of different standards for adequate reliability. Some authors contend that reliability values of less than .85 are unacceptable, particularly when an instrument is used for clinical purposes (Nunnally & Bernstein, 1994; Wood, Nezworski, Lilienfeld, & Garb, 2003). In support of this argument, it has been noted that many intelligence test scores meet this standard. However, a much more common recommendation suggests reliability values of .60 or greater are acceptable and values of .80 or higher are considered exceptional (Cicchetti & Sparrow, 1981; Fleiss, 1981; Landis & Koch, 1977; Shrout, 1998). It is noteworthy that the manual for the Minnesota Multiphasic Personality Inventory (MMPI; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) reports 26 internal reliability statistics for the 13 original broadband scales, some of the most extensively used scales in all of clinical assessment. Of those, 38% were less than .60, and another 35% were in the range .60 to .80, suggesting broadband instruments can be clinically valuable even when they do not meet desirable standards for reliability.
Interrater reliability. In the case of a behavioral observation technique such as the Rorschach, achieving consistency across settings requires consistency in administration and generating variables. For the Rorschach, the latter activity can be further subdivided into two phases. The first has to do with the assignment of a series of codes to individual responses, the second with the aggregation of code
results into variables (although this second part is increasingly accomplished via software). The first phase can be referred to as response coding; the second as protocol scoring.
Before the emergence of the Comprehensive System, psychoanalytic assumptions about the inevitable emission of conflictual material in response to ambiguous stimuli and competing perspectives on what comprises optimal administration led to a crazy quilt of Rorschach practices. For example, Exner and Exner (1972) found that more than 20% of surveyed clinicians did not score the Rorschach at all, and another 59% had personalized scoring by choosing elements from different instructional sets. The Comprehensive System, which began as an attempt to integrate elements from five different approaches to the Rorschach, successfully brought uniformity to the technique.
Even within the Comprehensive System, however, interadministrator reliability may still be an important issue. Lis, Parolin, Calvo, Zennaro, and Meyer (2007) found more experienced administrators tended to produce richer, more complex protocols. This finding is likely reflective of differences in the inquiry phase of the testing.2 Accordingly, this relationship between experience and richness may be unique to the Rorschach among commonly used assessment instruments.
Consistent with tradition, most of the literature on interrater reliability has focused on coding and scoring. Critical evaluations of the Comprehensive System have raised two key issues concerning its reliability. First, the original reliability data for the system was analyzed using percent agreement, which is not a true reliability statistic (McDowell & Acklin, 1996; Wood, Nezworski, & Stejskal, 1996). Second, many reliability studies are conducted in laboratory settings in which raters know their ratings will be evaluated, an awareness that would presumably contribute to scrupulousness. Field reliability refers to rating consistency in applied settings in which the rater is unaware the results will be evaluated. This is a potential problem for any
2In some ways the inquiry phase is more like the testing of limits often discussed in connection with the clinical use of performance-based psychological instruments rather than a standardized element of administration, because administrators decide on the number and focus of questions they ask about each response. Although clinically interesting, results from testing of limits generally do not contribute to quantitative outcomes on other tests. The inquiry phase in contrast is an integral contributor to Rorschach coding.
clinical instrument (e.g., see Vatnaland, Vatnaland, Friis, & Opjordsmoen, 2007), but Hunsley and Bailey (1999) suggested several reasons why the field reliability of the Comprehensive System could be particularly poor. In response, McGrath (2003) offered several reasons why the field reliability of the Comprehensive System might be no worse than typical.
These concerns inspired three large-scale studies of Comprehensive System interrater reliability using kappa coefficients for categorical response codes and intraclass correlation coefficients (ICCs) for dimensional protocol scores (Acklin, McDowell, Verschell, & Chan, 2000; McGrath et al., 2005; Meyer et al., 2002). In all three cases, the mean and median values for both kappa and the ICC were well within the acceptable range. However, the results merit more detailed analysis.
First, given the focus in the Rorschach on idiosyncratic responding, it is not surprising to find that some codes were extremely skewed. For example, less than 1% of responses involve an X-ray (McGrath et al., 2005; Meyer et al., 2002). McGrath et al. (2005) demonstrated that codes with very low base rates were associated with both poorer reliability and greater variability on average. In some studies, the authors chose to omit skewed codes or scores on the basis of skewed codes from their presentation of the results; in other cases, these codes were included.3
It is clear that codes of low reliability should be interpreted with caution. It is unclear, however, to what extent this factor undermines the Comprehensive System in practice. Some low-reliability codes are only interpreted after aggregation with other codes (see McGrath et al., 2005). Similarly, the Comprehensive System sometimes draws distinctions that do not enter into either the scoring or interpretation of the protocol, for example, whether a response was determined more by considerations of form or texture. Finally, the mean and median values for all four samples used in the three largescale reliability studies were more than adequate,
and consistent with results for other scales commonly used in clinical practice.
Two of the large studies of Comprehensive System reliability provided data relevant to the issue of field reliability. The first rater in the McGrath et al. (2005) study believed the results would be used only for clinical purposes. Meyer et al.’s (2002) Sample 4 was collected under similar circumstances. The results from these samples were consistent with those collected under laboratory conditions, suggesting poor field reliability at least is not inevitable in the Comprehensive System.
One final point of concern was that mean reliability values for the Acklin et al. (2000) study (.73–.80) were somewhat lower than those in the other two studies (.79–.91). This difference raises the important issue that, as with any test, interrater reliability is setting specific. Despite this caution, the overall pattern of results indicates that adequate reliability is at least possible for Comprehensive System variables, even in field settings.
Although the Comprehensive System is the dominant approach to the Rorschach, many Rorschach scales were created outside the context of the system. Several reviews have now demonstrated good reliability for nonsystem variables thought to measure aggression (Gacono, Bannatyne-Gacono, Meloy, & Baity, 2005; Katko, Meyer, Mihura, & Bombel, 2009), individuation in relationships (Bombel, Mihura, & Meyer, 2009), therapy prognosis (Handler & Clemence, 2005), and dependency (Bornstein & Masling, 2005), among others.
Internal reliability. The concept of internal reliability has not generally been applied to the Rorschach. It is possible to compute coefficient alpha for Rorschach codes if each card is treated as an observation and the presence–absence or frequency of each code on each card is treated as the outcome. An example of this approach is provided by Bornstein and Masling (2005), who found reliabilities of .61 to .62 for a measure of dependency.
3Another option involves grouping codes into logical subsets called segments and basing kappa on agreement within the segment (Hilsenroth, Charnas, Zodan, & Streiner, 2007; McDowell & Acklin, 1996; Meyer, 1997a). Although segmenting the codes produces less skewed variables, scoring and interpretation is based on individual codes rather than segments. Segmenting codes may avoid skew but produces reliability estimates without relevance to applied practice. It may also be noted that the issue of skew is generally recognized as a problem in reliability estimation, resulting in the development of various statistics for evaluating consistency that are less sensitive to skew. Such statistics have been criticized as inconsistent with the psychometric definition of reliability (see Shrout, Spitzer, & Fleiss, 1987).
Internal reliability is likely to be poor for many Rorschach variables, however, given that the number and content of responses are allowed to vary freely.
It would also be possible to evaluate the internal reliability of Exner’s (2003) global indexes. For example, his Perceptual Thinking Index aggregates information on the basis of several indicators of illogic or perceptual inaccuracy. To our knowledge such analyses have never been conducted. As in the case of the original MMPI scales, these indexes are explicitly formative measures (Edwards & Bagozzi, 2000) that were developed on the basis of correlations with an external criterion rather than internal consistency, so it may be argued the concept of internal reliability is not particularly relevant.
Test–retest reliability. The evaluation of test– retest reliability is particularly problematic in the context of the Comprehensive System. It requires setting the interval between administrations so there is a reasonable likelihood that the latent construct underlying the scale remains consistent. The interpretation of many Comprehensive System variables was derived actuarially, a technique that often leaves the latent construct poorly specified (McGrath, 2008). In some cases, Exner (2003) even developed his hypothesis about the meaning of a scale in part using evidence of the variable’s test– retest stability, an approach that reverses the usual pattern of test meaning determining the expected period of stability.
With these caveats in mind, the most extensive analysis of temporal stability in Rorschach variables was a meta-analysis conducted by Grønnerød (2003, 2006). As was the case for interrater reliability, mean correlations within variables over time were negatively skewed, with a small number falling below the value of .60. Within the Comprehensive System, many of the variables demonstrating poor test–retest reliability are interpreted largely in combination with other variables. For example, the Color-Form code (indicating a response mainly determined by color and secondarily by inkblot shape) was associated with a mean test–retest correlation of only .53, and the Color code (indicating a response determined purely by color with no
reference to form) was associated with a mean correlation of .57. Interpretation is based on the sum of these two codes, which was associated with a mean correlation of .76. Other variables such as Inanimate Movement (indicating a response involving movement by a nonliving object) and Diffuse Shading (indicating a response based on light–dark variation in the blot) are considered indicators of state latent variables. Despite these exceptions, Grønnerød estimated that the mean 6-month correlation for Rorschach variables was more than .70.
Validity
Several approaches to evaluating the validity of the Rorschach are flawed and will be dispensed with quickly. The first examines convergence between the Rorschach and a self-report indicator, such as the MMPI. Results of these efforts have been disappointing, with little evidence of correlation between variables that seem to be measuring similar variables (Archer & Krishnamurthy, 1993; Lindgren, Carlsson, & Lundbäck, 2007). Several hypotheses have been suggested to explain this failure to converge (e.g., Meyer, Riethmiller, Brooks, Benoit, & Handler, 2000), the most compelling of which suggests that psychologists tend to expect greater convergence across modes of functioning (McGrath, 2005) or methods of measurement (Bornstein, 2009) than tends to be the case. Research supporting this objection to using cross-method convergence as validity evidence has demonstrated the problem is endemic to implicit measures (Gawronski, LeBel, & Peters, 2007). In fact, good evidence indicates that implicit measures do not even converge among themselves, suggesting that each may tap a relatively discrete element of the construct (Nosek, Greenwald, & Banaji, 2007; Ziegler, Schmukle, Egloff, & Bühner, 2010). The failure to converge with self-report even suggests that BITs could demonstrate good incremental validity over self-report, a possibility that will be considered in the section Clinical Utility.
A second problematic approach involves metaanalyses generating a global estimate of validity across Rorschach variables and criteria (L. Atkinson, 1986; L. Atkinson, Quarrington, Alp, & Cyr, 1986; Hiller, Rosenthal, Bornstein, Berry, & Brunell-Neuleib, 1999; Parker, 1983; Parker, Hanson, &
Hunsley, 1988). These studies as a group support the conclusion that the Rorschach is an instrument of adequate validity and that it is on a par with the MMPI. Garb and his associates (Garb, Florio, & Grove, 1998, 1999; Garb, Wood, Nezworski, Grove, & Stejskal, 2001) have raised a number of methodological concerns about these studies. Some of their concerns are clearly justified, such as the omission of effect sizes from aggregates on the basis of the statistic used; others are more questionable, such as the omission of effect sizes on the basis of judgments of whether convergence would be expected (for responses from the meta-analysis authors, see Parker, Hunsley, & Hanson, 1999; Rosenthal, Hiller, Bornstein, Berry, & Brunell-Neuleib, 2001). The most problematic aspect of this line of research, however, is its global conclusions. It is universally accepted, for example, that the Rorschach is a valid indicator of thought disorder (e.g., Wood, Nezworski, & Garb, 2003). If 80% of published Rorschach studies focus on the prediction of thought disorder, a global metaanalysis could easily conclude the instrument as a whole is acceptably valid—even if only the variables related to thought disorder are valid. A more useful approach involves focused reviews targeting individual Rorschach variables. Such an analysis is in progress (Mihura, Meyer, Bombel, & Dumitrascu, 2010).
The substantial literature dedicated to the validity of Rorschach variables provides strong and
consistent evidence that the Rorschach can validly predict at least five aspects of characteristic style: disordered thinking, intelligence, effort or engagement in the task (although it is unclear to what extent this reflects a characteristic level of engagement versus a Rorschach-specific response), therapy prognosis, and dependence (see Table 17.2). The evidence for several other variables is suggestive of validity but insufficient for a firm conclusion.
Clinical Utility
Recent discussions of the clinical utility of the Rorschach have focused largely on two issues. First, the technique’s incremental validity over other commonly used evaluation techniques, particularly the clinical interview and the MMPI, has not been firmly established (Hunsley & Bailey, 1999). Insufficient evidence of incremental validity is particularly problematic in the case of the Rorschach since it is a costly instrument to use. Second, concerns have been raised concerning the normative data used to classify individuals for clinical purposes (Wood, Nezworski, Garb, & Lilienfeld, 2001).
Incremental validity. The information sources listed in Table 17.1 can be used to frame the debate over incremental validity. It is unclear to what extent a technique such as the Rorschach will improve over the clinical interview with regard to behavioral
TABLE 17.2
| Target | Example variables |
|---|---|
| Clearly valid | |
| Disordered Thinking | Form Qualitya (G. Frank, 1990), Thought Disorder Index (Holzman, Levy, & Johnston, 2005) |
| Intelligence | Response Complexity (Wood, Krishnamurthy, & Archer, 2003) |
| Effort/Engagement | Number of Responsesa (Meyer, 1997b) |
| Therapy Prognosis | Prognostic Rating Scale (Handler & Clemence, 2005) |
| Dependency | Oral Dependency Scale (Bornstein & Masling, 2005) |
| Probably valid | |
| Quality of Relationships | Concept of the Object (Levy, Meehan, Auerbach, & Blatt, 2005); Mutuality of Autonomy (Bombel et al., 2009) |
| Body Boundaries | Barrier and Penetration (O’Neill, 2005) |
| Ego Functioning | Ego Impairment Index (Stokes et al., 2003); Primary Process Scoring (Holt, 2005) |
| Distress | Morbid Responsesa (Mihura et al., 2010) |
| Organic Impairment | Piotrowski Signs (Minassian & Perry, 2004) |
Valid Uses for the Rorschach
a Comprehensive System variables. sampling, self-description, and the quality of thought and speech. It is a reasonable hypothesis that the emotional consequences of attempting to respond effectively to ambiguous stimuli can result in a different perspective than is provided by the interview, but this is an untested hypothesis.
In contrast, the Rorschach elicits thematic material and perceptual idiosyncrasies in a manner quite distinct from that offered by a clinical interview or a self-report measure. However, one of the important implications of a shift toward a cognitive rather than psychoanalytic perspective on implicit activity is the recognition that the activity may be easily self-observed (Gawronski et al., 2007; Kihlstrom, 2008). It is therefore important to evaluate whether implicit measures can enhance prediction over less expensive self-report measures.
Concerns about the incremental validity of the Rorschach have spurred several investigations in recent years. Dao, Prevatt, and Horne (2008) provided evidence that the Rorschach is a better predictor of psychosis than the MMPI. Although Dawes (1999) had earlier raised concerns about whether popular but complex Rorschach scores offer any incremental validity over simpler Rorschach scores for the evaluation of problems in thinking, this objection speaks not to the value of the Rorschach but of particular scores computed using the Rorschach. The Rorschach has also shown superiority to the MMPI as a predictor of therapy outcome (Meyer, 2000). Though encouraging, this small literature is an insufficient basis for concluding the relatively demanding Rorschach provides incremental validity over other methods. As one might expect, studies focusing on variables not listed in Table 17.2 have failed to support the Rorschach’s incremental validity (see Archer & Krishnamurthy, 1997; Garb, 1984). More research is needed on this topic before evidence of the Rorschach’s clinical utility is sufficient.
Normative data. Another issue of some concern in the literature on the clinical use of the Rorschach has to do with the accuracy of the normative data used to classify outcomes. Even if a Rorschach variable proves to be a valid predictor of reasonable criteria, the clinical results for that variable can
be inaccurate if the normative standards used to classify the case are incorrect. Wood et al. (2001) presented evidence suggesting that the standard Comprehensive System norms were too liberal, resulting in an excessive false positive rate for detecting psychopathology. This led to some debate about whether the original normative sample was unrepresentative of the general population (Wood, Nezworski, Lilienfeld, & Garb, 2003) or whether the population had shifted in the intervening years (Meyer, 2001). A related question was whether the original normative data gathered in the United States was sufficiently general to apply to residents of other nations (Mattlar, 2004).
General Conclusions
Evidence is sufficient, at least for the key variables used in interpretation of the Comprehensive System, to indicate interrater reliability can be adequate. The evidence suggests there is a small set of constructs for which the Rorschach is a clearly valid indicator, and an additional set for which there is decent evidence of validity. It is troubling how many of the variables listed in Table 17.2 are not included in the Comprehensive System. This state of affairs reflects Exner’s (2003) primary reliance on a select portion of the Rorschach literature as the inspiration for variables in the system.
Evidence also suggests that for thought disorder and therapy prognosis the Rorschach offers incremental validity over the MMPI. Additional research on incremental validity is warranted, particularly in comparison with other common instruments in addition to the MMPI.
The Rorschach continues to evolve, and substantial changes in recommended Rorschach practice are in process. Since John Exner’s death in 2006, a group of his colleagues has been developing a modified system intended to address many of the criticisms leveled against his work, to be called the Rorschach Performance Assessment System (RPAS; Meyer, Viglione, Mihura, Erard, & Erdberg, 2010). A fair amount of information about RPAS has already been released. It will include a revised set of administration instructions intended to reduce variability in the number of responses per protocol (Dean, Viglione, Perry, & Meyer, 2007), a revised
normative database using an international sample (Meyer, Erdberg, & Shaffer, 2007), modified scoring criteria for certain codes on the basis of psychometric considerations, and elimination of certain scores that seem to be invalid. Ideally, future validation of the RPAS will increase the basis for empirically informed Rorschach interpretation.
THE PSYCHOMETRIC STATUS OF THE TAT
Reliability
Concerns raised earlier about the importance of consistency in administration and scoring are particularly relevant in the case of the TAT. Murray (1943) identified a prescribed order for administering 20 cards to each respondent over two 50-minute sessions. Cost–benefit considerations led psychologists to reject that recommendation, but no standard alternative has emerged in its place. Clinicians and researchers have varied the number, content, and order of cards. Few clinicians engage in any formal scoring at all (Keiser & Prather, 1990; Pinkerman, Haynes, & Keiser, 1993), instead opting for a qualitative interpretation of uncertain validity. There are also alternative apperceptive pictures developed for special populations, such as children or African Americans as well as more narrowband picture sets intended to detect specific motivations (e.g., Bellak & Bellak, 1949, 1996; McClelland, Atkinson, Clark, & Lowell, 1953; Roberts & Gruber, 2005; Thompson, 1949). There are even differences in whether the respondent delivers the story verbally or in writing, which should have significant effects on productivity. The circumstances for the TAT are similar to the chaos described by Exner and Exner (1972) for the Rorschach before the emergence of the Comprehensive System. Any efforts to discuss the reliability or validity of the apperception technique are therefore problematic.
Although based on different scoring systems and different sets of drawings, some general conclusions can be drawn about the reliability of the TAT and apperceptive techniques in general (e.g., Entwisle, 1972; Lundy, 1985; Meyer, 2004). First, interrater reliability for scoring has consistently been found to be acceptable, although demonstrations of field
reliability are unavailable. Second, test–retest reliability is often quite poor, often failing to reach the .60 level. Finally, internal reliability is also usually unacceptable.
This last issue has received particular attention, as advocates of apperceptive techniques have argued that internal reliability should not be expected. J. W. Atkinson (1981) argued that themes should emerge in a saw-toothed pattern, where the response to one card satiates a need, but the failure to satiate on one card will then stimulate it on the next. This approach rests on several troubling assumptions, among them that needs must be regularly satiated, and that storytelling produces a labile pattern of satiation and activation. A more defensible alternative was offered by Schultheiss, Liening, and Schad (2008), who concluded that internal reliability analysis is irrelevant to the TAT because it is predicated on the assumption that the cards represent interchangeable observations. Instead, it is the person– situation interaction that accounts for most of the variance in productivity. Although this explanation sounds reasonable, it fails to explain why test–retest reliability is also poor.
As one would expect given the nature of reliability statistics, an important consideration in achieving adequate internal reliability on the TAT is the number of cards administered. Hibbard, Mitchell, and Porcerelli (2001) found that reliability coefficients on average were less than desirable for individuals administered four cards and did not consistently achieve acceptable levels except in their 10-card administration. The safest policy would therefore call for administering at least 10 cards, with five cards considered a bare minimum.
Validity
A number of different relatively narrowband scoring systems exist for the TAT (see Jenkins, 2008). This review focuses on four systems that have been particularly well researched.
Defense Mechanisms Manual. The Defense Mechanisms Manual (DMM; Cramer, 1991) was constructed to assess three defense mechanisms: denial, representing the most primitive of the three; projection; and identification, representing the most mature. Cramer (e.g., 1999, 2009) has presented a great deal of evidence to support the validity of the DMM. In some cases, however, the justification for this evidence seems strained or counterintuitive (e.g., that level of defense should not be related to intelligence among preadolescents, or that patients with gender-incongruent forms of depression used identification more because of its relationship to identity formation). Conclusions about the validity of the DMM await independent corroboration of reasonable hypotheses about the functioning of the defenses underlying the three scales.
General Conclusions
The TAT cannot be considered a single technique at this time. A unified approach to the TAT would require a standardized, practical set of stimuli and instructions. It would also require the adoption of an empirically founded scoring system that respects the instrument’s broadband nature, addresses issues relevant to clinicians, and can be scored in a costeffective manner. Fulfilling this set of conditions in the near future seems unlikely.
THE PSYCHOMETRIC STATUS OF FIGURE DRAWINGS
General Comments
Figure drawings differ from most other projective techniques in that they call for a physical rather than verbal response. The information sources listed in Table 17.1 are still relevant, although some modifications are in order. Elements of drawing style such as the use of heavily elaborated lines can be conceptualized as consistent with thematic materials as implicit indicators of emotional issues, whereas unusual details such as omitting windows from a house are similar to the perceptual idiosyncrasies found in TAT and Rorschach responding. The observation of extratest behavior and self-descriptive statements remains potentially useful, and the quality of thought and speech can be evaluated if the administrator tests limits by asking questions about the drawings.
Several figure-drawing techniques have been particularly popular. The House–Tree–Person (H-T-P; Buck, 1948) calls for drawings of the three objects listed, each on a separate piece of paper. The Draw A Person (DAP; Machover, 1949) requires drawing a person, then a person of the opposite sex. A more recent alternative is the Kinetic Family Drawing (KFD; Burns & Kaufman, 1972), which involves drawing a picture of one’s family doing something.
Figure drawings remain popular clinical instruments. They are easily administered to almost any individual. They also involve a familiar task that helps reduce anxiety about the testing, particularly in children. At the same time, they have suffered the most radical decline in respectability of the three BITs discussed in this chapter. Early work relied heavily on a sign approach, in which unusual drawing details were individually taken as evidence of a latent construct in a manner that relied heavily on psychoanalytic assumptions about the projection of unconscious conflicts onto ambiguous stimuli. This approach has been largely rejected (Joiner, Schmidt, & Barnett, 1996; Swensen, 1957, 1968), even by proponents of figure drawings as a clinical tool (e.g., Riethmiller & Handler, 1997).
Several scoring systems have since emerged for combining unusual drawing details and stylistic elements, usually with the goal of evaluating overall level of emotional distress. The best known of these methods were created by Koppitz (1968) and Naglieri, McNeish, and Bardos (1991), both of which are applied to the DAP. Accordingly, this review will focus on research evaluating these scoring systems, although it is uncertain whether the aggregative approach has superseded the sign or qualitative approach in the clinical use of figure drawings.
Reliability and Validity
Kinetic Family Drawings. Burns and Kaufman (1972) offered guidelines for scoring the KFD, but these were ambiguous and unsupported by research. Various scoring systems were suggested in subsequent years, but none has been subjected to more than a handful of empirical tests. Although these systems demonstrate adequate interrater reliability, evidence that they can identify children with
emotional difficulties is weak (Cummings, 1986; Knoff & Prout, 1985). Tharinger and Stark (1990) found a holistic evaluation on the basis of four characteristics of the KFD drawing was a better predictor of criteria than a 37-item scoring system.
Koppitz scoring. Koppitz (1968) selected 30 DAP signs she thought were indicative of emotional distress on the basis of her clinical experience. Although this technique was for many years the best-known approach to the scoring of emotional distress, research results have not been encouraging (e.g., Pihl & Nimrod, 1976). Tharinger and Stark (1990) found a holistic evaluation system on the basis of four characteristics of the DAP drawing was again a better predictor of emotional distress.
Draw A Person: Screening Procedure for Emotional Disturbance. The Draw A Person: Screening Procedure for Emotional Disturbance (DAP:SPED; Naglieri et al., 1991) consists of 55 criteria expected to distinguish between normal children and children with emotional difficulties on the basis of a review of the DAP research literature. The authors reported interrater reliability statistics for the DAP:SPED greater than .90, and internal reliability coefficients that varied between .71 and .77. They also found 1-week test–retest reliability statistics that exceeded .90. Wrightson and Saklofske (2000) reported a test–retest correlation of only .48 over 23 to 27 weeks. They noted that some of the children in their sample were in treatment during the intervening period, but their results raise concerns about the stability of scores on the DAP:SPED over longer intervals.
The DAP:SPED correlated mildly (.20–.30) with self-report scales of emotional difficulties (e.g., Wrightson & Saklofske, 2000). Scores on the DAP:SPED also differentiated between children with and without emotional problems. The accuracy of classification was less than would be desirable (Matto, Naglieri, & Claussen, 2005; McNeish & Naglieri, 1993; Naglieri & Pfeiffer, 1992; Wrightson & Saklofske, 2000). Given that the DAP:SPED involves even more judgments than the Koppitz criteria, a direct comparison of the DAP:SPED to the Tharinger and Stark (1990) holistic evaluation of the DAP would seem to be a useful topic for research.
General Conclusions
The use of individual signs from figure drawings as indicators of specific personality descriptors has been largely invalidated, but the scoring systems that were subsequently developed have their own problems. It is unclear whether an extensive scoring dedicated solely to evaluating emotional distress represents a reasonable cost–benefit ratio. If drawings will be gathered anyway, to reduce anxiety or establish rapport with a child, the holistic coding described by Tharinger and Stark (1990) offers an intriguing alternative requiring simple judgments by the clinician. Future research might also look into whether increasing the number of pictures offers any increment in validity.
CONCLUSION
It is likely that BITs will continue to play an important role in clinical assessment. At least some BITs are useful for reducing anxiety or deflecting focus from the respondent. As Table 17.1 indicates, their broad bandwidth allows the clinician to observe the respondent from multiple perspectives. Clinicians’ faith in the incremental validity of implicit measures over self-report measures has been used to justify their greater cost. Finally, gifted clinicians have described the use of their ambiguous qualities in a flexible way to test hypotheses about the respondent (e.g., Finn, 2003). One may expect continuing research to appear on BITs, and the Rorschach in particular, as clinical tools.
Their fate as research tools is more dubious. Complicating matters is the traditional association between BITs and psychoanalysis, an association that potentially interferes with a fair evaluation of their potential use in research. A more accurate and useful parallel may be drawn with narrowband implicit measures such as the IAT. This association allows one to draw several valuable conclusions about how best to understand BITs.
First, there is room yet for building a better mousetrap. The optimal BIT would consistently use stimuli that are obscure enough to encourage individual responding but evocative enough to engage the respondent. Second, popular narrowband techniques demonstrate an intuitive connection with the implicit process they attempt to gauge. A review of the constructs for which the BITs seem to demonstrate adequate fidelity would suggest they are all intuitively linked to the process or perceptual tendency they are intended to measure. The purely actuarial approach has largely failed as a means to identify BIT variables (McGrath, 2008).
Analytic theory suggests the drive to express conflicts and wishes will inevitably emerge in ambiguous situations. This assumption trivializes consistency in administration when there is consistent evidence that neutral instructions contribute to BIT validity (e.g., Lundy, 1988) and that variability in administration compromises reliability. It also exaggerates the likelihood that results will generalize. Research with implicit measures provides little evidence of convergence with self-report, and with each other, even as evidence is growing that narrowband implicit techniques can demonstrate incremental validity over self-report (Greenwald, Poehlman, Uhlmann, & Banaji, 2009). There is reasonable evidence that the three BITs reviewed demonstrate fidelity for some constructs; however, for figure drawings, this statement applies only to emotional disturbance. The widespread abandonment of BITs as research operationalizations may exceed the justification for doing so, although the administration costs may continue to suppress their use as research tools. These findings hardly represent a blanket endorsement of the use of BITs for research and clinical work, but they do suggest the right technique used in the right circumstances can potentially provide a useful method of measurement.
References
- Ackerman, S. J., Clemence, A. J., Weatherill, R., & Hilsenroth, M. J. (1999). Use of the TAT in the assessment of DSM–IV Cluster B personality disorders. Journal of Personality Assessment, 73, 422–448. doi:10.1207/S15327752JPA7303_9
- Acklin, M. W., McDowell, C. J., Verschell, M. S., & Chan, D. (2000). Interobserver agreement, intraobserver reliability, and the Rorschach Comprehensive System. Journal of Personality Assessment, 74, 15–47. doi:10.1207/S15327752JPA740103
- Ainsworth, M. D. S., Blehar, M. C., Waters, E., & Wall, S. (1978). Patterns of attachment: A psychological study of the strange situation. Hillsdale, NJ: Erlbaum.
Archer, R. P., & Krishnamurthy, R. (1993). A review of MMPI and Rorschach interrelationships in adult samples. Journal of Personality Assessment, 61, 277–293. doi:10.1207/s15327752jpa6102_9
Archer, R. P., & Krishnamurthy, R. (1997). MMPI–A and Rorschach indices related to depression and conduct disorder: An evaluation of the incremental validity hypothesis. Journal of Personality Assessment, 69, 517–533. doi:10.1207/s15327752jpa6903_7
Archer, R. P., & Newsom, C. (2000). Psychological test usage with adolescent clients: Survey update. Assessment, 7, 227–235. doi:10.1177/1073191 10000700303
Atkinson, J. W. (1981). Studying personality in the context of an advanced motivational psychology. American Psychologist, 36, 117–128. doi:10.1037/0003-066X.36.2.117
Atkinson, L. (1986). The comparative validities of the Rorschach and MMPI: A meta-analysis. Canadian Psychology/Psychologie canadienne, 27, 238–247. doi:10.1037/h0084337
Atkinson, L., Quarrington, B., Alp, I., & Cyr, J. (1986). Rorschach validity: An empirical approach to the literature. Journal of Clinical Psychology, 42, 360–362. doi:10.1002/1097-4679(198603)42:2<360::AID-JCLP2270420225>3.0.CO;2-R
Bellak, L. (1944). The concept of projection: An experimental investigation and study of the concept. Psychiatry: Journal for the Study of Interpersonal Processes, 7, 353–370.
Bellak, L., & Bellak, S. (1949). The Children’s Apperception Test. Larchmont, NY: CPS.
Bellak, L., & Bellak, S. (1996). The Senior Apperception Technique. Larchmont, NY: CPS.
Belter, R. W., & Piotrowski, C. (2001). Current status of doctoral-level training in psychological testing. Journal of Clinical Psychology, 57, 717–726. doi:10.1002/jclp.1044
Bombel, G., Mihura, J., & Meyer, G. (2009). An examination of the construct validity of the Rorschach Mutuality of Autonomy (MOA) Scale. Journal of Personality Assessment, 91, 227–237. doi:10.1080/00223890902794267
Bornstein, R. F. (2009). Heisenberg, Kandinsky, and the heteromethod convergence problem: Lessons from within and beyond psychology. Journal of Personality Assessment, 91, 1–8. doi:10.1080/0022389080 2483235
Bornstein, R. F., & Masling, J. M. (2005). The Rorschach Oral Dependency Scale. In R. F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach: Seven validated systems (pp. 135–157). Mahwah, NJ: Erlbaum.
Buck, J. N. (1948). The H-T-P technique: A qualitative and quantitative scoring manual. Journal of Clinical Psychology, 4, 317–396. doi:10.1002/1097-4679 (194810)4:4<317::AID-JCLP2270040402> 3.0.CO;2-6
Burns, R. C., & Kaufman, S. H. (1972). Actions, styles and symbols in Kinetic Family Drawings (K-F-D): An interpretive manual. New York, NY: Brunner-Routledge.
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). Minnesota Multiphasic Personality Inventory—2 (MMPI–2): Manual for administration and scoring. Minneapolis: University of Minnesota Press.
Cicchetti, D. V., & Sparrow, S. S. (1981). Developing criteria for establishing the interrater reliability of specific items in a given inventory. American Journal of Mental Deficiency, 86, 127–137.
Cramer, P. (1991). The development of defense mechanisms: Theory, research and assessment. New York, NY: Springer-Verlag.
Cramer, P. (1999). Future directions for the Thematic Apperception Test. Journal of Personality Assessment, 72, 74–92. doi:10.1207/s15327752jpa7201_5
Cramer, P. (2009). The development of defense mechanisms from pre-adolescence to early adulthood: Do IQ and social class matter? A longitudinal study. Journal of Research in Personality, 43, 464–471. doi:10.1016/j.jrp.2009.01.021
Cronbach, L. J. (1949). Statistical methods applied to Rorschach scores: A review. Psychological Bulletin, 46, 393–429. doi:10.1037/h0059467
Cronbach, L. J., & Gleser, G. C. (1957). Psychological tests and personnel decisions. Urbana: University of Illinois Press.
Cummings, J. A. (1986). Projective drawings. In H. Knoff (Ed.), The assessment of child and adolescent personality (pp. 199–244). New York, NY: Guilford Press.
Dao, T. K., Prevatt, F., & Horne, H. L. (2008). Differentiating psychotic patients from nonpsychotic patients with the MMPI–2 and Rorschach. Journal of Personality Assessment, 90, 93–101.
Dawes, R. M. (1999). Two methods for studying the incremental validity of a Rorschach variable. Psychological Assessment, 11, 297–302. doi:10.1037/ 1040-3590.11.3.297
Dean, K. L., Viglione, D., Perry, W., & Meyer, G. (2007). A method to optimize the response range while maintaining Rorschach comprehensive system validity. Journal of Personality Assessment, 89, 149–161.
Deri, S. (1949). Introduction to the Szondi test. New York, NY: Grune & Stratton.
Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5, 155–174. doi:10.1037/1082-989X.5.2.155
Entwisle, D. R. (1972). To dispel fantasies about fantasy-based measures of achievement motivation. Psychological Bulletin, 77, 377–391. doi:10.1037/ h0020021
Eurelings-Bontekoe, E. H. M., Luyten, P., & Snellen, W. (2009). Validation of a theory-driven profile interpretation of the Dutch Short Form of the MMPI using the TAT Social Cognitions and Object Relations Scale (SCORS). Journal of Personality Assessment, 91, 155–165. doi:10.1080/00223890802634274
Exner, J. E., Jr. (1989). Searching for projection in the Rorschach. Journal of Personality Assessment, 53, 520–536. doi:10.1207/s15327752jpa5303_9
Exner, J. E., Jr. (2003). The Rorschach: A comprehensive system: I. Basic foundations and principles of interpretation (4th ed.). New York, NY: Wiley.
Exner, J. E., Jr., & Exner, D. E. (1972). How clinicians use the Rorschach. Journal of Personality Assessment, 36, 403–408. doi:10.1080/00223891.1972.10119784
Finn, S. E. (2003). Therapeutic assessment of a man with “ADD.” Journal of Personality Assessment, 80, 115–129. doi:10.1207/S15327752JPA8002_01
Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York, NY: Wiley.
Fowler, J. C., Ackerman, S. J., Speanburg, S., Bailey, A., Blagys, M., & Conklin, A. C. (2004). Personality and symptom change in treatment-refractory inpatients: Evaluation of the phase model of change using Rorschach, TAT, and DSM–IV Axis V. Journal of Personality Assessment, 83, 306–322. doi:10.1207/ s15327752jpa8303_12
Fowler, J. C., & Groat, M. (2008). Personality assessment using implicit (projective) methods. In M. Hersen & A. Gross (Eds.), Handbook of clinical psychology: Vol. 1. Adults (pp. 475–494). Hoboken, NJ: Wiley.
Frank, G. (1990). Research on the clinical usefulness of the Rorschach: I. The diagnosis of schizophrenia. Perceptual and Motor Skills, 71, 573–578.
Frank, L. K. (1939). Projective methods for the study of personality. Journal of Psychology: Interdisciplinary and Applied, 8, 389–413. doi:10.1080/00223980.193 9.9917671
Frank, L. K. (1948). Projective methods. Springfield, IL: Charles C Thomas.
Freud, S. (1962). Further remarks on the neuropsychoses of defence. In J. Strachey (Ed. & Trans.), The standard edition of the complete psychological works of Sigmund Freud (Vol. 3, pp. 159–188). London, England: Hogarth. (Original work published 1896)
Freud, S. (1990). Totem and taboo: The standard edition. New York, NY: Norton. (Original work published 1913)
Gacono, C. B., Bannatyne-Gacono, L., Meloy, J. R., & Baity, M. R. (2005). The Rorschach extended aggression scores. Rorschachiana, 27, 164–190. doi:10.1027/1192-5604.27.1.164
Garb, H. N. (1984). The incremental validity of information used in personality assessment. Clinical Psychology Review, 4, 641–655. doi:10.1016/0272- 7358(84)90010-2
Garb, H. N., Florio, C. M., & Grove, W. M. (1998). The validity of the Rorschach and the Minnesota Multiphasic Personality Inventory: Results from meta-analyses. Psychological Science, 9, 402–404. doi:10.1111/1467-9280.00075
Garb, H. N., Florio, C. M., & Grove, W. M. (1999). The Rorschach controversy: Reply to Parker, Hunsley, and Hanson. Psychological Science, 10, 293–294. doi:10.1111/1467-9280.00154
Garb, H. N., Wood, J. M., Nezworski, M. T., Grove, W. M., & Stejskal, W. J. (2001). Toward a resolution of the Rorschach controversy. Psychological Assessment, 13, 433–448. doi:10.1037/1040-3590.13.4.433
Gawronski, B., LeBel, E. P., & Peters, K. R. (2007). What do implicit measures tell us? Scrutinizing the validity of three commonplace assumptions. Perspectives on Psychological Science, 2, 181–193. doi:10.1111/ j.1745-6916.2007.00036.x
Goldfried, M. R., & Kent, R. N. (1972). Traditional versus behavioral personality assessment: A comparison of methodological and theoretical assumptions. Psychological Bulletin, 77, 409–420. doi:10.1037/ h0032714
Greenwald, A. G., McGhee, D., & Schwartz, J. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74, 1464–1480. doi:10.1037/0022-3514.74.6.1464
Greenwald, A. G., Poehlman, T., Uhlmann, E., & Banaji, M. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology, 97, 17–41. doi:10.1037/a0015575
Grønnerød, C. (2003). Temporal stability in the Rorschach method: A meta-analytic review. Journal of Personality Assessment, 80, 272–293. doi:10.1207/ S15327752JPA8003_06
Grønnerød, C. (2006). Reanalysis of the Grønnerød (2003). Rorschach temporal stability meta-analysis data set. Journal of Personality Assessment, 86, 222–225. doi:10.1207/s15327752jpa8602_12
Groth-Marnat, G. (2009). Handbook of psychological assessment (5th ed.). Hoboken, NJ: Wiley.
Handler, L., & Clemence, A. J. (2005). The Rorschach Prognostic Rating Scale. In R. F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach: Seven validated systems (pp. 25–54). Mahwah, NJ: Erlbaum.
Hibbard, S., Mitchell, D., & Porcerelli, J. (2001). Internal consistency of the Object Relations and Social Cognition scales for the Thematic Apperception Test. Journal of Personality Assessment, 77, 408–419. doi:10.1207/S15327752JPA7703_03
Hiller, J. B., Rosenthal, R., Bornstein, R. F., Berry, D. T. R., & Brunell-Neuleib, S. (1999). A comparative meta-analysis of Rorschach and MMPI validity. Psychological Assessment, 11, 278–296. doi:10.1037/1040-3590.11.3.278
Hilsenroth, M., Charnas, J., Zodan, J., & Streiner, D. (2007). Criterion-based training for Rorschach scoring. Training and Education in Professional Psychology, 1, 125–134. doi:10.1037/1931-3918- .1.2.125
Holt, R. R. (2005). The Pripro scoring system. In R. F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach: Seven validated systems (pp. 191–235). Mahwah, NJ: Erlbaum.
Holtzman, W. H., Thorpe, J. S., Swartz, J. D., & Herron, E. W. (1961). Inkblot perception and personality. Austin, TX: University of Texas Press.
Holzman, P. S., Levy, D., & Johnston, M. H. (2005). The use of the Rorschach technique for assessing formal thought disorder. In R. F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach: Seven validated systems (pp. 55–95). Mahwah, NJ: Erlbaum.
Hunsley, J., & Bailey, J. M. (1999). The clinical utility of the Rorschach: Unfulfilled promises and an uncertain future. Psychological Assessment, 11, 266–277. doi:10.1037/1040-3590.11.3.266
Hutt, M. L. (1985). The Hutt adaptation of the Bender-Gestalt Test: Rapid screening and intensive diagnosis (4th ed.). New York, NY: Grune & Stratton.
Jenkins, S. R. (2008). A handbook of clinical scoring systems for thematic apperceptive techniques. Mahwah, NJ: Erlbaum.
Joiner, T., Schmidt, K., & Barnett, J. (1996). Size, detail, and line heaviness in children’s drawings as correlates of emotional distress: (More) negative evidence. Journal of Personality Assessment, 67, 127–141.
Katko, N., Meyer, G., Mihura, J., & Bombel, G. (2009). The interrater reliability of Elizur’s hostility systems and Holt’s aggression variables: A meta-analytical review. Journal of Personality Assessment, 91, 357–364.
Keiser, R. E., & Prather, E. (1990). What is the TAT? A review of ten years of research. Journal of Personality Assessment, 55, 800–803. doi:10.1207/ s15327752jpa5503&4_36
Kihlstrom, J. F. (2008). The psychological unconscious. In O. P. John, R. W. Robins, & L. A. Pervin (Eds.), Handbook of personality: Theory and research (3rd ed., pp. 583–602). New York, NY: Guilford Press.
- Knoff, H. M., & Prout, H. T. (1985). The kinetic drawing system: Family and school. Los Angeles, CA: Western Psychological Services.
- Koppitz, E. M. (1968). Psychological evaluation of children’s human figure drawings. New York, NY: Grune & Stratton.
- Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. doi:10.2307/2529310
Levy, K. N., Meehan, K. B., Auerbach, J. S., & Blatt, S. J. (2005). Concept of the Object on the Rorschach Scale. In R. F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach: Seven validated systems (pp. 97–133). Mahwah, NJ: Erlbaum.
Lindgren, T., Carlsson, A., & Lundbäck, E. (2007). No agreement between the Rorschach and self-assessed personality traits derived from the Comprehensive System. Scandinavian Journal of Psychology, 48, 399–408. doi:10.1111/j.1467-9450.2007.00590.x
Lis, A., Parolin, L., Calvo, V., Zennaro, A., & Meyer, G. (2007). The impact of administration and inquiry on Rorschach Comprehensive System protocols in a national reference sample. Journal of Personality Assessment, 89(Suppl. 1), S193–S200.
Lubin, B., Larsen, R., & Matarazzo, J. (1984). Patterns of psychological test usage in the United States: 1935–1982. American Psychologist, 39, 451–454. doi:10.1037/0003-066X.39.4.451
Lundy, A. (1985). The reliability of the Thematic Apperception Test. Journal of Personality Assessment, 49, 141–145. doi:10.1207/s15327752jpa4902_6
Lundy, A. (1988). Instructional set and Thematic Apperception Test validity. Journal of Personality Assessment, 52, 309–320. doi:10.1207/ s15327752jpa5202_12
Machover, K. (1949). Personality projection in the drawing of the human figure. Springfield, IL: Thomas. doi:10.1037/11147-000
Mattlar, C.-E. (2004). Are we entitled to use Rorschach Workshop’s norms when interpreting the Comprehensive System in Finland? Rorschachiana, 26, 85–109. doi:10.1027/1192-5604.26.1.85
Matto, H. C., Naglieri, J. A., & Claussen, C. (2005). Validity of the Draw-A-Person: Screening Procedure for Emotional Disturbance (DAP:SPED) in strengthbased assessment. Research on Social Work Practice, 15, 41–46. doi:10.1177/1049731504269553
McAdams, D. P. (1982). Experiences of intimacy and power: Relationships between social motives and autobiographical memory. Journal of Personality and Social Psychology, 42, 292–302. doi:10.1037/0022- 3514.42.2.292
McClelland, D. C. (1965). N achievement and entrepreneurship: A longitudinal study. Journal of Personality and Social Psychology, 95, 389–392. doi:10.1037/ h0021956
McClelland, D. C. (1975). Power: The inner experience. New York, NY: Irvington.
McClelland, D. C., Atkinson, J. W., Clark, R. A., & Lowell, E. L. (1953). The achievement motive. New York, NY: Irvington. doi:10.1037/11144-000
McClelland, D. C., Koestner, R., & Weinberger, J. (1989). How do self-attributed and implicit motives differ? Psychological Review, 96, 690–702. doi:10.1037/0033-295X.96.4.690
McDowell, C., & Acklin, M. W. (1996). Standardizing procedures for calculating Rorschach interrater reliability: Conceptual and empirical foundations. Journal of Personality Assessment, 66, 308–320. doi:10.1207/s15327752jpa6602_9
McGrath, R. E. (2003). Achieving accuracy in testing procedures: The Comprehensive System as a case example. Journal of Personality Assessment, 81, 104–110. doi:10.1207/S15327752JPA8102_02
McGrath, R. E. (2005). Conceptual complexity and construct validity. Journal of Personality Assessment, 85, 112–124. doi:10.1207/s15327752jpa8502_02
McGrath, R. E. (2008). The Rorschach in the context of performance-based personality assessment. Journal of Personality Assessment, 90, 465–475. doi:10.1080/00223890802248760
McGrath, R. E., Pogge, D. L., Stokes, J. M., Cragnolino, A., Zaccario, M., Hayman, J., . . . Wayland-Smith, D. (2005). Comprehensive System scoring reliability in an adolescent inpatient sample. Assessment, 12, 199–209. doi:10.1177/1073191104273384
McNeish, T. J., & Naglieri, J. A. (1993). Identification of individuals with serious emotional disturbance using the Draw A Person: Screening Procedure for Emotional Disturbance. The Journal of Special Education, 27, 115–121. doi:10.1177/002246699302700108
Meehl, P. E. (1945). The dynamics of “structured” personality tests. Journal of Clinical Psychology, 1, 296–303.
Meyer, G. J. (1997a). Assessing reliability: Critical corrections for a critical examination of the Rorschach Comprehensive System. Psychological Assessment, 9, 480–489. doi:10.1037/1040-3590.9.4.480
Meyer, G. J. (1997b). On the integration of personality assessment methods: The Rorschach and MMPI–2. Journal of Personality Assessment, 68, 297–330. doi:10.1207/s15327752jpa6802_5
Meyer, G. J. (2000). Incremental validity of the Rorschach Prognostic Rating scale over the MMPI Ego Strength Scale and IQ. Journal of Personality Assessment, 74, 356–370. doi:10.1207/S15327752JPA7403_2
Meyer, G. J. (2001). Evidence to correct misperceptions about Rorschach norms. Clinical Psychology: Science and Practice, 8, 389–396. doi:10.1093/clipsy.8.3.389
Meyer, G. J. (2004). The reliability and validity of the Rorschach and Thematic Apperception Test (TAT) compared to other psychological and medical procedures: An analysis of systematically gathered evidence. In M. J. Hilsenroth & D. L. Segal (Eds.), Comprehensive handbook of psychological assessment: Vol. 2. Personality assessment (pp. 315–342). Hoboken, NJ: Wiley.
Meyer, G. J., Erdberg, P., & Shaffer, T. (2007). Toward international normative reference data for the Comprehensive System. Journal of Personality Assessment, 89(Suppl. 1), S201–S216.
Meyer, G. J., Hilsenroth, M., Baxter, D., Exner, J., Fowler, J., Piers, C., & Resnick, J. (2002). An examination of interrater reliability for scoring the Rorschach comprehensive system in eight data sets. Journal of Personality Assessment, 78, 219–274. doi:10.1207/ S15327752JPA7802_03
Meyer, G. J., & Kurtz, J. (2006). Advancing personality assessment terminology: Time to retire “objective” and “projective” as personality test descriptors. Journal of Personality Assessment, 87, 223–225. doi:10.1207/s15327752jpa8703_01
Meyer, G. J., Riethmiller, R., Brooks, R., Benoit, W., & Handler, L. (2000). A replication of Rorschach and MMPI–2 convergent validity. Journal of Personality Assessment, 74, 175–215. doi:10.1207/ S15327752JPA7402_3
Meyer, G. J., Viglione, D. J., Mihura, J. L., Erard, R. E., & Erdberg, P. (2010, March). Introducing key features of the Rorschach Performance Assessment System (RPAS). Symposium presented at the Midwinter Meeting of the Society for Personality Assessment, San Jose, CA.
Mihura, J., Meyer, G., Bombel, G., & Dumitrascu, N. (2010, March). A review of the validity research as a basis for variable selection. Presented at the Midwinter Meeting of the Society for Personality Assessment, San Jose, CA.
Minassian, A., & Perry, W. (2004). The use of projective tests in assessing neurologically impaired populations. In M. J. Hilsenroth & D. L. Segal (Eds.), Comprehensive handbook of psychological assessment: Vol. 2. Personality assessment (pp. 539–552). Hoboken, NJ: Wiley.
Mischel, W. (1968). Personality and assessment. New York, NY: Wiley.
Morgan, C., & Murray, H. A. (1935). A method for investigating fantasies: The Thematic Apperception Test. Archives of Neurology and Psychiatry (Chicago), 34, 289–306.
Murray, H. A. (1938). Explorations in personality. New York, NY: Oxford University Press.
Murray, H. A. (1943). Manual for the Thematic Apperception Test. Cambridge, MA: Harvard University Press.
Musewicz, J., Marczyk, G., Knauss, L., & York, D. (2009). Current assessment practice, personality measurement, and Rorschach usage by psychologists. Journal of Personality Assessment, 91, 453–461. doi:10.1080/00223890903087976
Naglieri, J. A., McNeish, T. J., & Bardos, A. N. (1991). Draw A Person: Screening Procedure for Emotional Disturbance: Examiner’s manual. Austin, TX: Pro-Ed.
Naglieri, J. A., & Pfeiffer, S. I. (1992). Performance of disruptive behavior disordered and normal samples on the Draw-A-Person: Screening Procedure for Emotional Disturbance. Psychological Assessment, 4, 156–159. doi:10.1037/1040-3590.4.2.156
Niec, L. N., & Russ, S. (2002). Children’s internal representations, empathy and fantasy play: A validity study of the SCORS-Q. Psychological Assessment, 14, 331–338. doi:10.1037/1040-3590.14.3.331
Norcross, J., Koocher, G., & Garofalo, A. (2006). Discredited psychological treatments and tests: A Delphi poll. Professional Psychology: Research and Practice, 37, 515–522. doi:10.1037/0735-7028- .37.5.515
Nosek, B. A., Greenwald, A. G., & Banaji, M. R. (2007). The Implicit Association Test at age 7: A methodological and conceptual review. In J. A. Bargh (Ed.), Automatic processes in social thinking and behavior (pp. 265–292). New York, NY: Psychology Press.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.
O’Neill, R. M. (2005). Body image, body boundary, and the Barrier and Penetration Rorschach scoring system. In R. F. Bornstein & J. M. Masling (Eds.), Scoring the Rorschach: Seven validated systems (pp. 159–189). Mahwah, NJ: Erlbaum.
Parker, K. (1983). A meta-analysis of the reliability and validity of the Rorschach. Journal of Personality Assessment, 47, 227–231. doi:10.1207/ s15327752jpa4703_1
Parker, K. C. H., Hanson, R. K., & Hunsley, J. (1988). MMPI, Rorschach, and WAIS: A meta-analytic comparison of reliability, stability, and validity. Psychological Bulletin, 103, 367–373. doi:10.1037/ 0033-2909.103.3.367
Parker, K. C. H., Hunsley, J., & Hanson, R. K. (1999). Old wine from old skins sometimes tastes like vinegar: A response to Garb, Florio, and Grove. Psychological Science, 10, 291–292. doi:10.1111/ 1467-9280.00153
Phaf, R. H., & Kan, K. (2007). The automaticity of emotional Stroop: A meta-analysis. Journal of Behavior Therapy and Experimental Psychiatry, 38, 184–199. doi:10.1016/j.jbtep.2006.10.008
Pihl, R., & Nimrod, G. (1976). The reliability and validity of the Draw-A-Person Test in IQ and personality assessment. Journal of Clinical Psychology, 32, 470– 472. doi:10.1002/1097-4679(197604)32:2<470::AID-JCLP2270320257>3.0.CO;2-I
Pinkerman, J. E., Haynes, J. P., & Keiser, T. (1993). Characteristics of psychological practice in juvenile court clinics. American Journal of Forensic Psychology, 11, 3–12.
Piotrowski, C. (1999). Assessment practices in the era of managed care: Current status and future directions. Journal of Clinical Psychology, 55, 787–796. doi:10.1002/(SICI)1097- 4679(199907)55:7<787::AID-JCLP2>3.0.CO;2-U
Piotrowski, C., & Keller, J. W. (1984). Psychodiagnostic testing in APA-approved clinical psychology programs. Professional Psychology: Research and Practice, 15, 450–456. doi:10.1037/0735-7028.15.3.450
Piotrowski, C., & Zalewski, C. (1993). Training in psychodiagnostic testing in APA-approved Psy.D. and Ph.D. clinical psychology programs. Journal of Personality Assessment, 61, 394–405. doi:10.1207/ s15327752jpa6102_17
Rapaport, D. (1946). Diagnostic psychological testing (Vol. 1). Chicago, IL: Yearbook Publishers.
Riethmiller, R. J., & Handler, L. (1997). Problematic methods and unwarranted conclusions in DAP research: Suggestions for improved research procedures. Journal of Personality Assessment, 69, 459–475. doi:10.1207/s15327752jpa6903_1
Roberts, G. E., & Gruber, C. P. (2005). Roberts-2 manual. Los Angeles, CA: Western Psychological Services.
Ronan, G. F., Gibbs, M. S., Dreer, L. E., & Lombardo, J. A. (2008). Personal Problem-Solving System— Revised. In S. R. Jenkins (Ed.), A handbook of clinical scoring systems for thematic apperceptive techniques (pp. 181–207). Mahwah, NJ: Erlbaum.
Rorschach, H. (1942). Psychodiagnostics: A diagnostic test based on perception. Oxford, England: Hans Huber. (Original work published 1921)
Rosenthal, R., Hiller, J., Bornstein, R., Berry, D., & Brunell-Neuleib, S. (2001). Meta-analytic methods, the Rorschach, and the MMPI. Psychological Assessment, 13, 449–451. doi:10.1037/1040-3590 .13.4.449
Rosenzweig, S. (1978). Rosenzweig Picture–Frustration Study (P-F) (rev. ed.). Lutz, FL: Psychological Assessment Resources.
Rotter, J. B., Lah, M. I., & Rafferty, J. E. (1992). Manual: The Rotter Incomplete Sentences Blank: College form. New York, NY: Psychological Corporation.
Schultheiss, O., Liening, S., & Schad, D. (2008). The reliability of a Picture Story Exercise measure of implicit motives: Estimates of internal consistency, retest reliability, and ipsative stability. Journal of Research in Personality, 42, 1560–1571. doi:10.1016/j. jrp.2008.07.008
- Shrout, P. E. (1998). Measurement reliability and agreement in psychiatry. Statistical Methods in Medical Research, 7, 301–317. doi:10.1191/096228098672090967
- Shrout, P. E., Spitzer, R. L., & Fleiss, J. L. (1987). Quantification of agreement in psychiatric diagnosis revisited. Archives of General Psychiatry, 44, 172–177.
- Spangler, W. (1992). Validity of questionnaire and TAT measures of need for achievement: Two metaanalyses. Psychological Bulletin, 112, 140–154. doi:10.1037/0033-2909.112.1.140
- Stokes, J. M., Pogge, D., Powell-Lunder, J., Ward, A., Bilginer, L., & DeLuca, V. (2003). The Rorschach Ego Impairment Index: Prediction of treatment outcome in a child psychiatric population. Journal of Personality Assessment, 81, 11–19. doi:10.1207/ S15327752JPA8101_02
- Sundberg, N. (1961). The practice of psychological testing in clinical services in the United States. American Psychologist, 16, 79–83. doi:10.1037/h0040647
- Swenson, C. H. (1957). Empirical evaluations of human figure drawings. Psychological Bulletin, 54, 431–466. doi:10.1037/h0041404
- Swensen, C. H. (1968). Empirical evaluations of human figure drawings: 1957–21. Psychological Bulletin, 70, 20–44. doi:10.1037/h0026011
- Tharinger, D. J., & Stark, K. (1990). A qualitative versus quantitative approach to evaluating the Draw-A-Person and Kinetic Family Drawings: A study of mood- and anxiety-disordered children. Psychological Assessment, 2, 365–375. doi:10.1037/1040-3590- .2.4.365
- Thompson, C. E. (1949). Thematic Apperception Test: Thompson modification. Cambridge, MA: Harvard University Press.
- Vatnaland, T., Vatnaland, J., Friis, S., & Opjordsmoen, S. (2007). Are GAF scores reliable in routine clinical use? Acta Psychiatrica Scandinavica, 115, 326–330. doi:10.1111/j.1600-0447.2006.00925.x
- Wakefield, J. (1986). Creativity and the TAT blank card. The Journal of Creative Behavior, 20, 127–133.
- Weiner, I. B. (1977). Approaches to Rorschach validation. In M. A. Rickers-Ovsiankina (Ed.), Rorschach psychology (pp. 575–608). Huntington, NY: Krieger.
- Weiner, I. B. (1994). The Rorschach Inkblot Method (RIM) is not a test: Implications for theory and
practice. Journal of Personality Assessment, 62, 498–504. doi:10.1207/s15327752jpa6203_9
- Weiner, I. B. (2003). Principles of Rorschach interpretation (2nd ed.). Mahwah, NJ: Erlbaum.
- Westen, D. (1991). Social cognition and object relations. Psychological Bulletin, 109, 429–455. doi:10.1037/0033-2909.109.3.429
- Westen, D. (1995). Social Cognition and Object Relations Scale: Q-sort for projective stories (SCORS Q). Unpublished manuscript, Department of Psychology, Emory University, Atlanta, GA.
- Westen, D. (1998). The scientific legacy of Sigmund Freud: Toward a psychodynamically informed psychological science. Psychological Bulletin, 124, 333–371. doi:10.1037/0033-2909.124.3.333
- Winch, R. F., & More, D. M. (1956). Does TAT add information to interviews? Statistical analysis of the increment. Journal of Clinical Psychology, 12, 316–321. doi:10.1002/1097-4679(195610)12:4<316:: AID-JCLP2270120403>3.0.CO;2-P
- Wood, J. M., Krishnamurthy, R., & Archer, R. (2003). Three factors of the Comprehensive System for the Rorschach and their relationship to Wechsler IQ Scores in an adolescent sample. Assessment, 10, 259–265. doi:10.1177/1073191103255493
- Wood, J. M., Nezworski, M. T., & Garb, H. N. (2003). What’s right with the Rorschach? The Scientific Review of Mental Health Practice, 2, 142–146.
- Wood, J. M., Nezworski, M. T., Garb, H. N., & Lilienfeld, S. O. (2001). The misperception of psychopathology: Problems with the norms of the Comprehensive System for the Rorschach. Clinical Psychology: Science and Practice, 8, 350–373. doi:10.1093/clipsy.8.3.350
- Wood, J. M., Nezworski, M. T., Lilienfeld, S. O., & Garb, H. N. (2003). What’s wrong with the Rorschach? Science confronts the controversial inkblot test. San Francisco, CA: Jossey-Bass.
- Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996). The Comprehensive System for the Rorschach: A critical examination. Psychological Science, 7, 3–10. doi:10.1111/j.1467-9280.1996.tb00658.x
- Wrightson, L., & Saklofske, D. (2000). Validity and reliability of the Draw A Person: Screening Procedure for Emotional Disturbance with adolescent students. Canadian Journal of School Psychology, 16, 95–102. doi:10.1177/082957350001600107
- Ziegler, M., Schmukle, S., Egloff, B., & Bühner, M. (2010). Investigating measures of achievement motivation(s). Journal of Individual Differences, 31, 15–21. doi:10.1027/1614-0001/a000002
Social Cognition and Object Relations Scale.
Westen (1991, 1995) developed the Social Cognition and Object Relations Scale (SCORS) to tap dimensions of psychological functioning derived from object relations theory. The most recent version is composed of eight scales (Complexity, Affect, Relationships, Morals, Causality, Aggression, Self-Esteem, and Identity), each of which is scored on a 1 to 7 global rating indicating level of maturity, although many studies use an earlier version involving 1 to 5 ratings of only four dimensions. The ratings for each SCORS variable are averaged across the TAT responses. A substantial body of literature from multiple laboratories supports the overall validity of the SCORS (e.g., Ackerman, Clemence, Weatherill, & Hilsenroth, 1999; Eurelings-Bontekoe, Luyten, & Snellen, 2009; Fowler et al., 2004; Niec & Russ, 2002). Although concerns were raised earlier about the lack of convergence among implicit measures, there is evidence of convergent validity between the SCORS and Rorschach, and in some cases even between the SCORS and self-report measures.
Motivational themes. Murray (1943) originally intended the TAT as an indicator of the various motivations he referred to as needs, and evidence consistently supports the validity of the instrument for this purpose. An article by McClelland, Koestner, and Weinberger (1989) on the measurement of achievement motivation has sparked much of current interest in the TAT as a motivational measure, especially after a meta-analytic review by Spangler (1992) concluded the TAT was a better predictor of achievement motivation than self-report measures. However, Entwisle (1972) questioned whether
intelligence could account for the relationship, and this issue remains unresolved. Research into the TAT as a predictor of the motivation for power, affiliation, and intimacy (McAdams, 1982; McClelland, 1965, 1975) has produced similar evidence of validity, although studies of incremental validity are almost nonexistent (Winch & More, 1956).
Problem solving. The TAT has been extensively studied as an indicator of ability to problem-solve effectively using the Personal Problem-Solving System—Revised (Ronan, Gibbs, Dreer, & Lombardo, 2008). The system has been validated in a number of studies, some of which have controlled for intelligence as a possible confound.